Running Head: ROBUSTNESS OF LORD’S FORMULAS Robustness of Lord’s Formulas for Item Difficulty and Discrimination Conversions between Classical and Item Response Theory Models

نویسندگان

  • Teresa Dawber
  • W. Todd Rogers
  • Michael Carbonaro
چکیده

Lord (1980) proposed formulas that provide direct relationships between IRT discrimination and difficulty parameters and conventional item statistics. The purposes of the present study were to determine (1) the veracity of the two formulas within the context that Lord proposed, and (2) the robustness of the formulas beyond the initial and restrictive conditions identified by Lord. Simulation and real achievement data were employed. Results from the simulation study indicate that the a-parameters were recovered quite well for low to moderately discriminating items regardless of ability distribution and the b-parameters were recovered quite well for the range typically found for achievement tests. Results of the real data were consistent with that found for the simulation study. Robustness of Lord’s Formulas 3 Robustness of Lord’s Formulas for Item Difficulty and Discrimination Conversions between Classical and Item Response Theory Models The field of psychometrics encompasses different models that offer alternative frameworks for performing test and item analyses. The classical test score theory (CTST) model, the foundation of which was provided by Charles Spearman in 1904, is the traditional means of conducting item and test analyses. The family of item response theory (IRT) models, first introduced by Lord in 1952 for dichotomously scored items, was developed to circumvent the limitations of CTST. However, Lord (1980; also see Lord & Novick, 1968) proposed formulas that link the item difficulty and item discrimination indices of the CTST and the two-parameter IRT model. Lord (1980, pp. 33-34) stipulated that under certain conditions the difficulty and discrimination indices derived from the two measurement frameworks are connected. That is to say, the classical item discrimination parameter may be used to predict the IRT discrimination parameter and the classical item difficulty parameter may be used to predict the IRT difficulty parameter. For item discrimination, to the extent that number correct score x is a measure of ability (θ), the biserial correlation between the item and test score (ρ′ix) is an approximation to the correlation between the item and ability estimate (ρiθ). The association yields a relationship between the conventional biserial item-test correlation and the IRT discrimination index ( i a ): i a 2 1 ix ix ρ ρ ′ − ′ . Robustness of Lord’s Formulas 4 Therefore, the IRT item discrimination parameter and the biserial correlation are approximately monotonic increasing functions of each other. Lord stated that the relationship is “valid only for the case where θ is normally distributed and there is no guessing” (p. 33). However, Lord qualified that the approximations are crude and may fall short because (1) the test score x contains errors of measurement while θ does not, and (2) x and θ have differently shaped distributions, since the relation between x and θ is nonlinear. Lord also proposed a monotonic relation between the IRT difficulty index (bi) and the classical difficulty index (πi) when all items are equally discriminating. The relationship between the difficulty indices is described as bi i ix ≅ ′ γ ρ . The difficulty parameter bi is proportional to γ i , the cut point on the continuous normal distribution underlying the binary item that separates the proportion of incorrect answers (1 πi). and the proportion of correct answers (πi). Both bi andγ i decrease as πi increases. Review of Studies Using Lord’s Formulas The formulas provided by Lord (1980) were first presented in Lord and Novick (1968). Even though the relationships described are the same, the only qualifying condition in the earlier writing was that θ be normally distributed with a mean of zero and unit variance. Several studies were conducted in the mid to late 1970s using the formulas, also referred to as the heuristic method, within the framework of the threeparameter model. In 1980, Lord added the stipulation that the formulas were only applicable for the 2PL model. Despite the use of an incorrect IRT model, the following studies provide insight into how the formulas may function in the intended context. Robustness of Lord’s Formulas 5 Description of Studies Using the formulas proposed by Lord and Novick (1968), Urry (1974) developed a graphical method. He devised graphs that consisted of mapping a grid system to model aand b-parameters onto a coordinate system where the population point-biserial correlation, rather than the biserial correlation, is the ordinate and the population proportion passing an item is the abscissa. By plotting the data points for a given item using the conventional indices, the values of a and b may be interpolated. When there is no guessing, the graph is symmetric. When there is guessing, the graph is displaced to reflect inflation in the proportion passing the item and attenuation in the point-biserials through error due to guessing. Urry (1974) proposed that the following four conditions needed to be met for effective application of the graphical method: (1) the latent trait is normally distributed; (2) the classical indices are based on large samples (N = 2,000) in order to approximate the set of parameters; (3) the items in the test must be homogenous (K-R 20 ≥ 0.90); and (4) the items in the test must be of sufficient number (n = 80) for the point-biserial correlation between item and total test score to bear a close relationship to the correlation between the item and the latent ability measured by the test. Urry examined the graphical approximations using data from 4,950 examinee responses to 98 unscreened mathematics items from the Washington Pre-College Test Battery, a highly reliable test (K-R 20 = 0.93). Correlations between the aand bparameters derived from the graphs and their maximum likelihood (ML) estimates were 0.89 and 0.97, respectively. Urry concluded that the correlation coefficients indicated a strong degree of accord between the heuristic approximations and the ML estimates. Robustness of Lord’s Formulas 6 In a theoretical paper Schmidt (1977) proposed that the graphical procedure proposed by Urry tends to systematically underestimate ai and overestimate |bi| and the variance of bi because the point-biserial correlation between the item score and the estimated latent trait (i.e., total test score), ˆ i rθ , is taken as an estimate of the point-biserial correlation between the binary item and the perfectly reliable latent trait, ˆiθ ρ . Values of ˆ i rθ are attenuated because of guessing on item i, and the unreliability of θ . Schmidt argued that Urry’s four criteria for the total test score to be an estimate of latent trait score, θ , would minimize rather than eliminate the effect. Schmidt pointed out that that increased values of the biserial correlation imply larger ai and smaller | bi |. The heuristic method has also been used with simulation data. Jensema (1976) simulated data and compared the parameter estimates set during the data generation phase to the estimates derived from the heuristic method and ML estimation. Forty-eight data sets were created with a total of 2,800 items and 44,000 simulated examinees. True abilities of examinees were normally distributed. The simulation design consisted of: sample sizes of 250, 500, 750, and 1000; test lengths of 25, 50, and 100; a-parameters of 0.5, 1.0, 1.5, and 2.0, consistent within a dataset; b-parameters between -2.4 and 2.4 at intervals of 0.2; and c-parameters of 0.2. Parameter values derived from the heuristic method were used as starting values for the ML procedure. The overall correlations between the true and heuristic estimates were 0.80 and 0.96 for the aand b-values respectively, while the overall correlations between the true and ML estimates were 0.86 and 0.97 for the aand b-values, respectively. Jensema concluded that the heuristic estimates were “surprisingly accurate” (p. 713). The correlations revealed that the true parameters and the corresponding estimates derived Robustness of Lord’s Formulas 7 from the heuristic method increased with greater sample size and a greater number of test items, as initially suggested by Urry. Jensema concluded that the heuristic method may be used as a convenient technique for examining the worth of an item pool for tailored testing. Ree (1979) also conducted a simulation study to assess the effectiveness of the heuristic method. The aand b-values derived from the heuristic method and the aand b-values derived from three common computer programs (i.e., ANCILLES, LOGIST, OGIVIA) were correlated with true item parameters. Using the 3PL model, data were generated for an 80-item test for normally distributed, positively skewed, and uniformly distributed groups of 2,000 examinees. The true item parameters represented real examination data and were normally distributed (Ma = 0.95, SDa = 0.28; Mb = 0.16, SDb = 0.93; Mc = 0.20, SDc = 0.05). The correlations between estimated and true parameters revealed that the b-values were more closely aligned to the true parameters than the a-values. The heuristic values and the values obtained from the computer programs yielded correlations equal to or higher than 0.90 for the b-parameters. Correlations of a-parameters and the values obtained from the heuristic method were 0.32, 0.35, and 0.59 for the skewed, normal, and uniform ability distributions, respectively. Correlations of a-parameters and the values obtained from the computer programs also were variable across ability distributions. The lowest correlations, ranging from 0.44 to 0.57, were observed for the skewed data, whereas high correlations were found for the normal distribution (range of 0.83 to 0.84), and the uniform distribution (range of 0.87 to 0.90). Robustness of Lord’s Formulas 8 Contributions and Shortcomings of the Studies Even though Lord’s formulas were used in the context of the three-parameter model, the studies suggest that the transformation procedures from the classical item indices to the corresponding IRT item indices may provide some promise as a heuristic technique under certain conditions. General findings indicated that the b-parameters derived from the heuristic method were highly correlated with true or ML estimates of bvalues regardless of the shape of the ability distribution, whereas the correlations for the a-parameters were moderately to highly correlated. Correlations were presented as evidence of the accuracy of the heuristic method to reproduce the item parameters in the studies reviewed above. However, high correlations only indicate that sets of values are strongly linearly related; they provide no evidence of parameter recovery. Ree (1981) noted that a correlation between a set of parameters and their estimates would be misleading if systematic bias was present, such as consistent overor under-estimation. In 1980, Lord clarified the circumstances for which the formulas were relevant by stating that they are “valid only for the case where θ is normally distributed and there is no guessing” (p. 33). The accuracy of the formulas under these two conditions has not been determined. Instead, attention has been given to the comparability of CTST and IRT item indices determined by analyzing the same data set with both models and using correlational techniques to determine the degree of association (e.g., Fan, 1998; MacDonald & Paunonen, 2002; Stage, 1998a, b, 1999) Although not directly related to the purposes of the present study, the findings of this research reveals that the difficulty and discrimination indices of the two models are linearly related. Robustness of Lord’s Formulas 9

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Comparison of Two Models for Evaluation of Pre-internship Comprehensive Test: Classical and Latent Trait

Introduction: Despite the widespread use of pre-internship comprehensive test and its importance in medical students’ assessment, there is a paucity of the studies that can provide a systematic psychometric analysis of the items of this test. Thus, the present study sought to assess March 2011 pre-internship test using classical and latent trait models and compare their results. Methods: In th...

متن کامل

Psychometric Properties of State Level Subjective Vitality Scale based on classical test theory and Item-response theory

The purpose of the present study was to investigate the factor structure and Item-Response parameters of State Level of Subjective Vitality Scale. The research design was correlational, and the statistical population consisted of students of the Shahid Beheshti University of Tehran. Sample group including 240 students were selected through multi-stage sampling and completed Subjective Vitality ...

متن کامل

ساخت و اعتباریابی آزمون تشخیصی حساب نارسایی برای کودکان پایه پنجم دبستان

The main purpose of this study was to develop and validate a diagnostic test for dyscalculia in fifth grade in primary schools of Isfahan. For this purpose, content analysis was conducted on the content of fifth grade math textbook. Keripendorf’s coefficient of 0.88 was found based on consistency of content analysis. Based on Bloom’s (1926) Cognitive Theory, 150 questions were designed and tes...

متن کامل

Psychometric Properties of the Brief Form of Professor-Students Rapport Scale-based on Classical Test Theory and Item-Response Theory

Introduction: In order to improve the quality of the teaching process, it is necessary to review the professor-student rapport. The purpose of the present study was to investigate the factor structure and item-response parameters of Professor-Students Rapport Scale-Brief (PSRS-B). Methods: In a descriptive-correlation study, 497 students from Shahid Beheshti University of Medical Sciences were ...

متن کامل

Utility of Complex Alternatives in Multiple-Choice Items: The Case of All of the Above

This study investigated the utility of all of the above (AOTA) as a test option in multiple-choice items. It aimed at estimating item fit, item difficulty, item discrimination, and guess factor of such a choice. Five reading passages of the Key English Test (KET, 2010) were adapted. The test was reconstructed in 2 parallel forms: Test 1 did not include the abovementioned alternative, whereas Te...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004